=programming =graphics =algorithms
Current 3d rendering works as follows:
There's an
array of triangles.
Each triangle has pointers to 3 elements of a
vertex array.
Each vertex has a texture position and possibly extra
data.
The pixels in the triangle are found.
The triangle
position at each pixel is tested against the z-buffer to check if it's
visible.
For each pixel, the texture position and other data are
interpolated between the triangle vertices.
The interpolated texture
position is used to access one or more textures. The closest pixels in the
textures to the desired point are interpolated.
The interpolated
texture data is input to a shader.
What if you just put the texture
data in the vertex? People do that! Using vertex colors that way has been
somewhat common in computer graphics, and is known to have better
performance. The main disadvantage is that resolution is lower. But what if
you just use more vertices? If you have a vertex per pixel, then vertex
colors have as much resolution as texture lookups. Why not do that?
That used to be too many triangles for GPUs. Now, it isn't.
Animation
with that many vertices gets computationally expensive. But we can solve
this by animating models with fewer vertices, then doing polygon
tessellation. This is often called
micropolygon
rendering.
Micropolygons are sometimes used with displacement
mapping, which replaces
normal maps with
textures that indicate vertical offsets. Each micropolygon has a lookup into
the displacement map texture, with the texture position interpolated from
its parents. The micropolygons generated by tessellation don't have
vertex data. All they have is interpolated data from their parent vertices.
But why not give them vertex data with colors and displacement, instead of
using textures?
Obviously it's easier to add displacement maps to
existing models, but in theory, replacing textures with micropolygons with
vertex colors should be a viable system.
The existing texture-based
systems work very well. Why is there any need to consider alternate systems?
Unreal had similar thoughts behind its creation of
Nanite, so we can see what they have to say.
That doesn't use
tesselation; instead, it has very high poly count model with hierarchical
levels of detail. For each piece of a model, Nanite determines if it's
visible using
occlusion
geometry, determines how large it appears, and then chooses a model
version with the appropriate number of polygons.
As
that Nanite page notes, normal maps take more data than enough polygons
to make them unnecessary. Modern games take a lot of storage space, and this
is primarily because of high-resolution textures. Reducing the data used for
3d models could significantly reduce download and loading times.
You
also, of course, get the advantages of a real high-poly model over normal
maps: the fake displacement of normal maps looks wrong if you look closely
from a steep angle, and doesn't cast shadows correctly.
Nanite is
also in some ways easier for artists. A typical art workflow for static
environments involves:
- sculpting a
high-poly untextured model
- polygon reduction
- UV unwrapping
-
texturing
With Nanite, you can use the
high-poly sculpt directly, eliminating the polygon reduction step, which is fairly automated these days but still
saves some time.
But the Nanite approach has some disadvantages
compared to tessellation. A big one is that Nanite does not support
animation with bones or blend shapes. It also uses more data: a regular
vertex has XYZ positions, while a generated vertex only needs displacement.
(Also, because you can store displacement as a % of the distance between
parent vertices, you can use fewer bits for it, but this isn't really
necessary.)
In addition to eliminating the polygon reduction step, I
also want to eliminate the UV unwrapping step. One approach to doing this is
"mesh colors".
That uses vertex colors to generate textures for models. This has a
performance penalty, but has some potentially significant advantages for
artists.
Why can't we just use vertex colors with Nanite instead of
textures? Well, we can, but that doesn't get the full advantages of mipmaps.
Suppose you have a colorful painting, and you zoom out until it's a single
pixel on the screen. The correct color for that pixel is the average of
whole painting, not a random pixel from the high-resolution version. Mipmaps
store pre-averaged colors that make this easy. Nanite has multiple levels of
detail, but that's not enough to replace mipmaps.
The viability of
mesh colors suggests a possible solution: generate pseudotextures
dynamically. So, to render a model, choose a LOD that has about one vertex
per pixel. In general, some areas will then have several vertices per pixel.
So, first render that model LOD at a higher resolution than the display
resolution. Some blocks of this render might have higher resolution than
others, depending on the vertex density in that block. Then, use that render
as a texture, blur the texture, interpolate the vertex data at each vertex
location in screen space, and cache that interpolated data in the vertices.
That approach should result in perfect anisotropic filtering if the
texture matches the model orientation. It's desirable to only recalculate
the cached values occasionally, so there would be some mismatch. But
calculating this data is relatively fast: while the resolution is higher, no
lighting or shaders are applied, so rendering would be equivalent to that of
an unlit vertex colored model, which is very fast. If these cached values
are calculated from 16x the pixels of the base resolution, and recalculated
every 16 frames, the performance cost would be relatively small, and the
quality should be good enough. But we can do better, by prioritizing
recalculation of cached values for models with the greatest change in
orientation relative to the camera.
In terms of data layout, the
above approach might look like this:
A mesh has the following arrays:
- parent
triangles: indices for 3 parent vertices, indices for 3 subvertices, a pointer to a vertex data array,
4 subtriangle indices
- subtriangles: indices for 3 subvertices, 4
subtriangle indices
- parent vertices: XYZ positions, a vertex data
index, cached vertex data
- subvertices: displacement, a vertex data
index, cached vertex data
Vertex data arrays would be
shared across models. A vertex data element is accessed with the pointer in
a triangle and an index in the vertex. Different vertex data arrays may have
different amounts of data per vertex; a typical element might include 7
numbers: RGB, specularity, RGB emission. The cached vertex data has the same
size as the source vertex data, but is unique to the mesh rather than
shared.
The pointers to vertex data arrays are per-triangle instead
of per-mesh so that multiple arrays of vertex data can be used in a single
mesh, to enable better reuse across models. The cached vertex data stored at
each vertex element would then have the largest element size of any vertex
data
arrays used by the mesh.
I think this is a viable rendering system! Maybe Epic or Microsoft or AMD or NVIDIA should give it a try!